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Abstract 

Jointly Gaussian memoryless sources are observed at ./V distinct terminals. The 
goal is to efficiently encode the observations in a distributed fashion so as to enable 
reconstruction of any one of the observations, say the first one, at the decoder sub- 
ject to a quadratic fidelity criterion. Our main result is a precise characterization of 
the rate-distortion region when the covariance matrix of the sources satisfies a "tree- 
structure" condition. In this situation, a natural analog-digital separation scheme 
optimally trades off the distributed quantization rate tuples and the distortion in the 
reconstruction: each encoder consists of a point-to-point Gaussian vector quantizer 
followed by a Slepian-Wolf binning encoder. We also provide a partial converse that 
suggests that the tree structure condition is fundamental. 



1 Introduction 

The focus of this study is the problem of distributed source coding of memoryless Gaussian 
sources with quadratic distortion constraints. The rate-distortion region of this problem with 
two terminals has been recently characterized [13]. Our focus, hence, is on the case when 
there are at least 3 terminals. In this paper, we study a special case of this general problem: 
the so-called "many-help-one" situation depicted in Figure [TJ The setup is the following: 
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Figure 1: The many- help-one problem. 



• Sources: Each of the N encoders observes a memoryless discrete-time source: encoder 
i observes, over n discrete time instants, the memoryless source xf. The observations 
across the encoders are correlated, however. Specifically, the joint observations at 
time m (x\(m), . . . , £7v(m)) are jointly Gaussian. Further, the joint observations are 
memoryless over time m. 

• Encoders: Each encoder i maps the vector of analog observations (over n time instants, 
say) into a vector of bits (of length Rin, say) that is then communicated without loss 
to a single decoder (on a link with rate Ri). 

• Decoder: The decoder is only interested in reconstructing one of the sources, (say, x™). 
The fidelity criterion considered here is a quadratic one: the average (over the statistics 
of the sources) / 2 distance between the original source vector and the reconstructed 
vector is required to be no more than Dn. 

• Problem statement: The problem is to characterize the minimum set of rates at which 
the encoders can communicate with the decoder while still conveying enough informa- 
tion to satisfy the quadratic distortion constraint on the reconstruction. 

In this paper, we precisely characterize the rate-distortion region of a class of many-help- 
one problems. A crucial step towards solving this problem involves the introduction of a 
related distributed source coding problem where the source has a "binary tree" structure; 
this is done in Section [2j We show that the natural analog-digital separation strategy of 
point-to-point Gaussian vector quantization followed by a distributed Slepian-Wolf binning 
scheme is optimal for this problem (this is done in Sections 12.31 and 12.41) . Next, we show how 
this result can be used to solve various instances of the many-help-one problem of interest; 
this is done in Section [31 Finally, various ancillary aspects of the problem at hand are 
discussed in Section HJ specifically the worst-case property of the Gaussian distribution with 
respect to the analog- digital separation architecture is demonstrated and a partial converse 
for the necessity of the tree-structure condition is provided. 

2 The Binary Tree Structure Problem 

In this section, we take a short detour away from the many-help-one problem of interest (c.f. 
Figured]). Specifically, we introduce a related distributed source coding problem that we 
call the "binary tree structure problem" . We show that the natural analog- digital separation 
architecture is optimal in terms of the rate-distortion tradeoff for this problem. The connec- 
tion between the original many-help-one problem and this binary tree structure problem is 
made in the next section. 

The outline of this section is as follows: 

• we introduce the source variables and their statistical relationships first (Section 12. 1\\ : 

• next we specify precisely the binary tree structure problem (Section 12. 2ft ; 
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• we evaluate the performance of the natural analog- digital architecture in terms of the 
rate-distortion tradeoff for the binary tree structure problem (Section \2.3\i : 

• under the assumption that certain variables have positive variance, we derive a novel 
outer bound to the rate-distortion region — this involves a careful use of the entropy- 
power inequality (extracting critical ideas from 0[9]) and is one of the most important 
technical contributions of this paper (Section 12.41) ; 

• again under the positive variance assumption, we show that the outer bound to the 
rate-distortion region indeed matches the inner bound derived by evaluating the natural 
analog- digital separation architecture (Section 12.41) ; 

• using a continuity argument, we relax the positive variance assumption and show 
that the separation architecture is optimal for all binary tree structure problems (Sec- 
tion ESD; 

• finally, we show that Gaussian sources are the worst case in the sense that a non- 
Gaussian source has a larger rate-distortion region than a Gaussian source with the 
same covariance matrix, so long as the Gaussian source satisfies the tree structure 
(Section El]). 



2.1 Binary Gauss-Markov Trees 

Consider the Markov binary tree structure of Gaussian random variables depicted in Figure [2j 
Formally, the Gauss-Markov tree structure represents the following Markov chain conditions: 
consider the node denoted by the random variable x^' . We define the set of left descendants, 

(k) 

the set of right descendants, and the tree of x\ to be 



2 fe 

_ J J 1 ) . 



UixT 1 ) = xV> :l>k, " h ' < j < 



2 z (i — 0.5) . 2 l i 
Tlx®) = (zMukUWucUM,, 



respectively. We define the set of nodes V y x ij to be 

{•'•;• :V././}\T (,' 



(fc) 

i 

Then, by definition, the Markov chain condition given by Figure [2] says that conditioned on 
the random variable xf*\ the sets of random variables V (x^j, £ (x^j, and 1Z \ are 
independent; further, this is true for all pairs (z, k). 
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Figure 2: The binary tree structure. 
2.1.1 A Specific Construction 

(k) 

Now consider the following specific construction of x\ s that satisfies the Markov chain 
structure in Figure [3 Let m, k, and i denote the time index, the tree depth index, and the 
node within the tree depth index, respectively. Then define 

(fc+i)/ \ (fc+i) (fe)/ \ , (fc+i)/ \ /i\ 

= «2i ( 2 ) 



where the indices vary as: 



m = 1, . . . , n, (3) 
fc = (4) 
i = l,...,2 k -\ (5) 



Here a 2 i-i an d c4i are rea l numbers. The random variables 



nf } (m), fc = 2,...,L, i = 1, . . . , 2*" 1 , m=l,...,n!> (6) 



are independent Gaussian random variables (with zero mean and variance a \ k) for the index 
pair (k,i) and any m). Further, these random variables are all independent of the root 
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random variables 

|x^(m), m = 1, . . . , n j . 
Finally let the root random variables 



x^\m), m — 1, . . . ,n| 



be a collection of i.i.d. Gaussian random variables with zero mean and variance cr 2 m . From 

this construction, it readily follows that the random variables satisfy the tree structure in 
Figure El Formally: 

Claim 1 For this construction, the x\ satisfy the Markov chain conditions in Figure^ 
2.1.2 Necessity of Construction 

Conversely, this is also the most general way of constructing jointly Gaussian random vari- 
ables that satisfy the binary tree structure. We state this formally below: 

Claim 2 Any zero-mean, jointly Gaussian lx\ k \ k = 1, . . . , L, i — 1, . . . , 2 fc ~ 1 j that sat- 
isfy the Markov tree structure in Figure [H can be represented using the above construction 
(c.f. Equations ([1]) and 

Proof: The steps are routine: For a fixed 1 < k < L and 1 < i < 2 k ~ 1 , consider the Gaussian 
random variable x^-i ■ Since it is jointly Gaussian with all of the variables in V(x^l^), we 
can write: 



ffe+i) 



Here the random variable n^^i is Gaussian and independent of all the nodes in V(x^l ). 
Further, the conditional expectation in Equation ([7]) is simply the linear conditional expec- 
tation that is particularly simple (this is due to the Markov chain conditions imposed by 
the tree structure): specifically, conditioned on x\ k ^ the random variable of focus, x^l , is 
independent of all the other variables in V{x^^). Thus we can write 



E 



x 2i-\ I ' l x 2i-l ) 



a {k+1) x {k) 

VL 2i-\ x i i 



for some real number oc^-i ■ Substituting Equation ([8]) in Equation (JTj), we have derived 

Equation ([!]). The derivation of Equation ([2]) is analogous. Since n k is independent of 

V{xf^) for all i and k, the required independence conditions hold and the conclusion follows. 
□ 

2.2 Problem Statement 

Denote the vector 

«^^a),...,x«(n)). (9) 
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Figure 3: The problem setup. 



Similar notation will be used for other vectors to be introduced later. Consider the following 
distributed source coding problem depicted in Figure [3J There are 2 L_1 distributed encoders 
each having access to a memoryless observation sequence: encoder i observes the memoryless 
random process xf^. The goal of each encoder is to map the observation into a discrete set 
(encoder % maps its length- n observation into a discrete set Ci). The encoded observation is 
then conveyed to the central decoder on rate-constrained links. The rate of communication 
from encoder % to the decoder is 

-log|C< |. 

n 

The decoder forms an estimate x{ n of the root of the binary tree, x^ n , based on the messages 
Ci, . . . , C 2 l-i. The average distortion of the reconstruction is 



1 - 

- y z 

m=l 



X 



m) 



x 



m 



The goal is to characterize the set of achievable rates and distortions (Ri, 
those such that there exists an encoder and decoder such that 



R 2 L-i,d), i.e., 



Ri > — log I Ci I for all i 

n 



and 



d > 



n 

-Ye 



m=l 



.(1)/ 



m, 



We denote the closure of this set by 1ZV* . 
We note that two special cases of this problem have been resolved in the literature: 

• L = 1 is the single-user Gaussian source coding problem with quadratic distortion, 

• L = 2 is the Gaussian CEO problem solved in [TJ E]. 
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The recent work in [8] studies a special case of the general tree structure depicted in Figure |21H 
While a general outer bound is derived in [8] for that special case of the tree structure, it is 
shown to be tight only for a certain range of the parameters in the problem (the distortion 
constraint and the covariance matrix of the Gaussian sources). 

Our main result is that a natural strategy of point-to-point Gaussian vector quantization 
followed by Slepian-Wolf binning is optimal for any L. In the next section we formally 
present the natural achievable strategy and then state our main result. In the subsequent 
section, we prove a novel outer bound and use it to establish the main result. 

2.3 Analog-Digital Separation Strategy 

The natural achievable analog- digital separation strategy is depicted in Figure HI each en- 
coder first vector quantizes the observation as in point-to-point Gaussian rate distortion 
theory, and then codes the quantizer outputs using a Slepian-Wolf binning scheme. The rate 
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Figure 4: The natural separation scheme. 



tuples needed by this architecture to satisfy the distortion constraint can be calculated by 
the so-called Berger-Tung inner bound [jQ: let 



U = f (ui,U 2 , ■ • ■ ,U 2 L-i) 



(10) 



denote a vector of 2 L 1 jointly Gaussian random variables. Consider the set U(d) of u such 
that 



For each i — 1, 



, Ui satisfies 



{L) . 

Ui = a>ix\ + Wi, 



IT 



where a 1: . . . , a 2 L-i are constants and Wi, . . . , w 2 l-i are independent zero-mean Gaus- 

(k) 

sian random variables that are also independent of the x\ s. It is convenient to assume 
that «j G [0, 1] and that Wi has variance (1 — af)a 2 (L) , so that x^ and Ui have the 

X i 

same variance. This assumption incurs no loss of generality. 



^^As an aside, we note that the material in [S] along with our own previous work [13] provided the impetus 
to the present work. 
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u satisfies 

2 



E[x\ >\u] 



<d. (12) 



Now, consider 

AC{1,...,2 L ~ 1 }. (13) 

Denote the set 

{Ui : i e A} = u A . (14) 
Similar notation will be used for other vectors introduced later. We now have: 

Lemma 1 [Berger-Tung inner bound JT$] The analog- digital separation architecture achieves 
convex hull of the rate- distortion region 



dpf 



(#!,••• ,R 2 L-i,d) : 3ueU(d) 3 VAC {l,...,2 z '- 1 }, 

^^^/(x^jiulu^c)]. (15) 
ieA 

In particular, TZT>* contains co(lZD- m ), where co(-) denotes the closure of the convex hull. 

The region TZT> in can be explicitly computed for a given covariance matrix for the observed 
Gaussian sources. This computation is aided by the following combinatorial structure of the 
set 1ZV m . 

2.3.1 Combinatorial Structure of TZV m 

Consider a specific u e W(d) (this parameterizes a specific choice of the analog-digital sepa- 
ration architecture) and the rate tuples (Ri, . . . , R^h-i) that satisfy the conditions 

£>>/CA), VAC{1,...,2 L - 1 } (16) 
ieA 

where 

/(^/(xJWh*). (17) 

Consider the following properties of the set function / for all Ai,A% C {l, . . . ,2 L ~ 1 }. We 
have / (0) = 0. 

Lemma 2 

/(A) > 0, (18) 

f(A 1 u{t}) > f(Ai), Vte {l,...^ -1 }, (19) 

/(Au^ 2 ) + /(Ani 2 ) > /(A) + /(^ 2 ). (20) 



s 



Proof Equation ([TBI) follows from the non-negativity of mutual information. Equation (|T9l) 
follows from the chain rule of mutual information: for t ^ A\, we have 

/ (Ai U {i}) = / (u Al ; zf^x^ | u A c ) + J (w t ; xf^xj 5 | u (AlUW) cj , 
> / (u Al ; xf^xJJ | u^cj , 
= /(A). 
Finally, consider (1201) . Let 

5 = {z : Var(«i|a;f } ) > 0}. 

Suppose i G (A U A 2 ) n 5 C . If Var(x J (L) |X(5 1 u^ 2 )=) > °> then /(-^i U A ^> = °°> so O 
trivially holds. If Var(x^ Ix^^^) = 0, then 

Var(x! L) |x^) = Var(x 4 (L) |x^) = Var(xfV^) = °> 

so and ^2 can be replaced with A±\{i} and A2\{i}, respectively, without affecting the 
validity of (1201 . By repeating this process as many times as necessary, we may assume that 

A u A 2 c B. 

This case requires the use of the Markov property satisfied by u: in particular, we have 
by construction 

(L) 

meaning that these tree variables form a Markov chain in the specified order. Thus we can 
write 

h (u Al 1 xg, u Ai ) = j2 h { u i 1 ■ ( 21 ) 

ieAi 

Now we rewrite f(Ai) as, using fl2Tj) . 

/(A) = h (u Al I u^c) - h (ik I xf >) (22) 

= ^(^-^(u^-^/i^lx^). (23) 

It follows from (1231) that we have shown (120]) if 

/i (u^c) + h (u A c) > h (u (AlUAl2) c) + h (u {Al nA2r) > 

i.e., 

which is true since conditioning cannot increase the differential entropy. □ 
A polyhedron such as the one in (|T6|) with the rank function / satisfying the properties in 
Lemma[2]is called a contra-polymatroid. A generic reference to the class of polyhedrons called 
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matroids is [15] and applications to information theory are in [TT] where natural achievable 
regions of the multiple access channel are shown to be polymatroids and in [3j [TT] where 
natural achievable regions are shown to be contrapolymatroids. An important property of 
contra-polymatroids is summarized in Lemma 3.3 of [TT]: the characterization of its vertices. 
For n a permutation on the set {l, . . . , 2 L ~ 1 }, let 

&£> = / ({TTx, 7T 2 , . . . , TT,}) - / ({TTx, 7T 2 , . . . , TT^}) ,1 = 1... 2 £ ~\ 

and b^ = (b$ , . . . , fei^-iV Then the 2 L ~ 1 ! points {b^, 7T a permutation}, are the ver- 
tices of (and hence belong to) the contra-polymatroid ( |T6l) . We use this result to conclude 
that all of the constraints in (1T61) are tight for some rate tuple and there is a computationally 
simple way to find the vertex that leads to a minimal linear functional of the rates [TT] . 



2.4 An outer bound for a special case 



We first focus on the case in which a (k) > for all % and k. We abbreviate this condition 

by saying that "all of the noise variances are positive." To derive our outer bound, we need 
the following definitions: 

• Fix 1 < k < L — 1 and 1 < i < 2 k ~ l and define the function 

(fe+1) 2 (fe+1) 2 

def 1. / - " 2i - 1 o-x " 2i 



/>) (ri, r 2 ) d ^ - log [ 1 + ^ 2 (1 - e"^) + ^ (l - e"^) ] , n, r 2 > 0. 

(24) 



9 ° l a 2 v ; rr 2 



(fe) 

For node , we define the set of associated observations to be 

0(&) - j,;2M<i<«}. (25) 

To each node in the binary tree structure of Figure [2] we associate a nonnegative 
number, known as noise- quantization rate. Specifically associate with the node 
x\ k \ A physical interpretation for the nomenclature "noise quantization rate" will be 
available during the proof of the outer bound. 

For each node x\ define the set TA,A c { x i ) to be the set of noise-quantization rates 
(say, rj) of the variables (say Xj) in the tree of xf^ whose associated observations are 

entirely in A or A c and are such that none of the ancestors of x ? have this property. 
Formally, 

*A,A< U k) ) = {rf : xf G T (xf) , O (xf) Cior0 (xf) C A< 



x^ ] G T (x[ k) ) with O (x®) cAorO (x®) C A c , 
and G K(x®) u£(x< 6) )}. 
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Likewise, we let r^x^) denote the set of noise-quantization rates of variables in the 
tree of xf 1 whose associated observations are entirely in A and are such that none of 
the ancestors have this property. Formally, 

*A (4 k) ) = {rf : xf e T (xf) , O (xf) C A 



x^ e T fx?A with O (x^) c A, 
andxf e n{x^) U£(xi 6) )} . (26) 



Define the following set of noise-quantization rates (r\ k \ 1 < k < L, 1 < i < 2 k 1 

W) = jr« > 0, r« > \ log "-f, rf > < 4« (r^l\ r^) J . (27) 

We next implicitly define a collection of functions of the noise- quantization rates. Con- 
sider a set of noise-quantization rates \ r^\ 1 < k < L,l < i < 2 fe ~M in jF r (<i). Then 
for any i and k, we have 

r{"</^ (rSti'.r'T)- 
Since /^(k) is increasing in both arguments, this implies 

r?» <4 f . (4« ;! > ('•t + l , .''^M ) ).'•^ +I, ) 

-(*) <T f f f /U*+ 2 ) _(k+2)\ , ( (fc+2) (fc+2)\\ 

U < J X W [J x £+l) [ r 4i-3 » r 4i-2 J » ^ r 4i-l , r 4i J J • 

By repeating this substitution process, we may obtain an upper bound on r k in terms 
of the noise-quantization rates in r^^c [x[ k ^j. We implicitly define 

fif(^(^)) (28) 



to be this upper bound. (By convention, if 

then we define this upper bound to be r\ k ^ itself.) We then let 
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denote the function of r_4 ( x\ j obtained by evaluating the function in ( 1281) with all 
of the noise quantization rates in 

r^. (4") \ (4**) 

set equal to zero. The significance of this function will be apparent in the proof of the 
outer bound. 

• For any set 

AC {l,2,...,2 i - 1 }, (29) 
we define the ancestors set at level k to be 

A^ = {*:0(xf ;) )n^$}, (30) 

where $ denotes the empty set. 
Consider the following region, lZU out , defined as 

TZV out = { (R u • • • , R 2L -x,d) : 3 {rf } } e F r {d) 3 

VAc{1,...,2^}J2R*>J:J: (31) 

■ieA k=i 

This constitutes an outer bound to the rate-distortion region of the binary tree structure 
problem: 

Lemma 3 For the binary tree structure problem in which all of the noise variances are 
positive, 

nv* c nv out . (32) 

Proof: See Appendix lAl 

We next show that the outer bound just derived matches the inner bound derived from 
the analog-digital separation architecture (c.f. Lemma[T|). Recall that we use co(-) to denote 
the closure of the convex hull of a given set. 

Lemma 4 For the binary tree structure problem in which all of the noise variances are 
positive, 

TZV out = co (TZV m ) . (33) 

Proof: See Appendix IB1 
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2.5 Main Result 



Using a continuity argument, one can relax the assumption that all of the noise variables 
have positive variance. This allows us to conclude our first main result of this paper: the 
optimality of the analog- digital separation architecture in achieving the rate-distortion region 
of the binary tree structure problem. 

Theorem 1 For the binary tree structure problem, the optimal rate- distortion region is 
achieved by the analog- digital separation architecture, 

KV* = co (KV in ) . 

Proof See Appendix O □ 



2.6 Worst- Case Property 

Up to this point we have assumed that the source variables are jointly Gaussian. In this 
section, we justify this assumption by showing that the rate-distortion region for other dis- 
tributions with the same covariance are only larger. 

Let \xf^ \ be a Gaussian source satisfying the tree structure as before. Let 

be an alternate source with the same covariance of 

•"\ i ^1 ) • • • ) Jj 2 L ~ 1 

Note that the alternate source need not be part of a Markov tree. Let VSD denote the 
rate-distortion region of the alternate source. 

The separation-based architecture yieldsjm inner bound on the rate-distortion region 
of the alternate source. Specifically, let TZV in denote the region obtained by replacing 
\Xi'\xi'\ . . . , #2£-iJ with \^x^\x^\ . . . , x^-ij m the discussion in Section [231 Then 

co (lU) in ) C TU) . 



Theorem 2 A Gaussian source satifying the binary tree structure has the smallest rate- 
distortion region for its covariance: 

1ZV* c TU) . 

In fact, the separation-based architecture has the most difficulty compressing a Gaussian 
source in the sense that 



tzv* = co (nvia) c co [nv in j c nv . (34) 

Proof See Appendix iDl □ 
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3 Tree Structure and the Many- Help- One Problem 



We now turn to the main problem of interest: the many- help-one distributed source coding 
problem. As in the tree structure problem, there is a natural analog-digital separation 
architecture that is a candidate solution. This is illustrated in Figure [5j 
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Figure 5: The natural analog-digital separation architecture. 



3.1 Main Result 

Our main result is a sufficient condition under which the analog-digital separation architec- 
ture is optimal. To state it, we first define a general Gauss-Markov tree: it is made up of 
jointly Gaussian random variables and respects the Markov conditions implied by the tree 
structure. The only extra feature compared to the binary Gauss-Markov tree (c.f. Figure W) 
is that each node can have any number of descendants (not just two). 

Theorem 3 Consider the many-help- one distributed source coding problem illustrated in 
Figured Suppose the observations x\, . . . ,xn can be embedded in a general Gauss-Markov 
tree of size M > N. Then the natural analog- digital separation architecture (c.f. Figure^) 
achieves the entire rate- distortion region. 

Proof: The proof is elementary and builds heavily on Theorem [TJ We outline the steps 
below: 

• A general Gauss-Markov tree can be recast as a (potentially larger) binary Gauss- 
Markov tree with the root being identified with any specified node in the original tree. 
To see this, we only need to observe that the Markov chain relations are the same no 
matter which node is identified as the root. 

• Next, by potentially increasing the height of the binary tree (to L > L) we can ensure 
that the observations X\, . . . ,xn are a subset of the 2 L_1 leaves of the binary Gauss- 
Markov tree. If one observation of interest, say x^ is an intermediate node of the 
binary Gauss-Markov tree we can effectively make it a leaf by adding descendants that 
are identical (almost surely) to x«. 
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This allows us to convert the many-help-one problem into a binary tree structure problem 
(with potentially more observations than we started out with). The analog- digital separation 
architecture is optimal for this problem (c.f. Theorem [1]). By restricting the corresponding 
rate-distortion region to the instance when the rates of the encoders corresponding to the 
observations that were not part of the original N are zero, we still have the optimality of the 
analog- digital separation architecture. This latter rate-distortion region simply corresponds 
to the many-help-one problem studied in Figure [D This completes the proof. □ 
We illustrate the two key steps outlined above with an example with N = 4. Suppose 
that . . . , X4. CcLIl be embedded in the tree depicted in Figure El This tree happens to be 
binary, but unfortunately the root is not the source of interest, x±. Figure [7] shows how to 
construct a new Gauss-Markov tree that still preserves the Markov conditions but has X\ as 
its root. Finally, a binary Gauss-Markov tree of height 5 is constructed that has the original 
four observations as a subset of its 16 leaf nodes; this is done in Figure [8] — here any node 
indicated by a dot is simply identically equal (almost surely) to its parent node. Finally we 
can set to zero the rates of all the encoders except those numbered 1, 9, 13 and 14. This 
allows us to capture the rate-distortion region of the original three-help-one problem. 



Figure 6: Four observations are embedded in a (binary) Gauss-Markov tree. 
3.2 Worst-Case Property 

As with our earlier result for the binary tree structure problem, the Gaussian assumption in 
Theorem [3] can be justified on the grounds that it is the worst-case distribution. Specifically, 
as in Section 12.6} let x\, . . . ,xn denote an alternate source with the same covariances as 



Xi, . . . ,xn- Let 1ZV denote the rate-distortion region of the source, and let lZV m denote 
the inner bound obtained by replacing the source variables in the discussion in Section 12.31 
with the alternate source X\, . . . ,Xn- 

Theorem 4 A Gaussian source that can be embedded in a Gauss-Markov tree has the small- 
est rate- distortion region for its covariance: 




nv* c nv . 
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Figure 7: The tree rewritten with X\ as the root. 




Figure 8: The many-help-one problem rewritten as a binary tree structure problem. 

In fact, the separation-based architecture has the most difficulty compressing a Gaussian 
source in the sense that 

nv* = co (nv in ) c co (n3 m ) c kv. 

The proof of Theorem [2] applies verbatim here. 

3.3 Tree Structure Condition and Computational Verification 

If N = 2, then x\ and x<i can always be placed in the trivial Gauss-Markov tree consisting 
of these two variables; no embedding is needed in this case. We note that N = 2 corre- 
sponds to the "one-help-one" problem, whose rate-distortion region has been determined by 
Oohama [6]. With N > 3, embedding is not always possible. We see an example of this next, 
where we also see a simple test for when N linearly independent variables can themselves be 
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arranged in a tree, without adding additional variables. We then derive a condition on the 
covariance matrix of x\, . . . , x^ that is necessary for these variables to be embedded as the 
nodes of a general Gauss-Markov tree. Finally, we show that this condition is also sufficient 
when N = 3. 



3.3.1 Trees Without Embedding 



We next demonstrate a simple test for when N linearly independent, jointly Gaussian random 
variables can themselves be arranged in a tree, without adding additional variables. Without 
loss of generality, we may assume that Xi, . . . , xjy each has unit variance (this can be ensured 
by normalizing each observation). We shall write 



Pij 



Suppose that x±, . . . ,Xn are linearly independent, and let K x denote their (invertible) co- 
variance matrix. We will use the following fact from the literature (Speed and Kiiveri |10j): 

xi, . . . , xn are Markov with respect to a simple, undirected graph G if and only 
if for all i 7^ j such that is not an edge in G, the (i, j) entry of K" 1 is zero. 

Now let G denote the simple, undirected graph with Xi,...,Xn as the nodes obtained by 
interpreting K" 1 — I as the adjacency matrix: there is an edge between Xi and Xj if and only 
if the (i, j) element of K" 1 — I is nonzero. It follows that x±, . . . , x^ can be arranged in a 
Gauss-Markov tree if and only if G is a tree, or more generally, a forest (i.e, a collection of 
unconnected trees). 

This fact can be illustrated with the following example. Suppose that N = 3 and 



1 1/4 
1/4 1 
1/4 1/4 



Then 



KI 



1 

9 



10 

-2 
-2 



-2 
10 
-2 



1/4" 
1/4 
1 

-2 
-2 
10 



(35) 



which yields a fully-connected graph. Hence xi, X2, and £3 cannot be arranged in a Gauss- 
Markov tree. 

Nevertheless, it is possible that x±,X2, x% can be embedded in a larger Gauss-Markov tree. 
Indeed, in this case it turns out that it is possible to embed the variables in a tree of size 4. 
We offer the following specific construction to demonstrate this fact. Let xq be a standard 
Normal random variable and let 

1 



Xi = 


2 


X + Zi 




1 




X2 = 


2 


x + z 2 




1 




= 


2 


■ x + z 3 
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where z±, Z2, and 23 are i.i.d. Gaussian with variance 3/4, and are independent of xq. The 
covariance matrix for this quadruple of variables is 



1 1/2 1/2 1/2 

1/2 1 1/4 1/4 

x 1/2 1/4 1 1/4 

1/2 1/4 1/4 1 

The inverse of this matrix is 

3 -1 -1 -1 

-12 

-10 2 

-10 2 

with the resulting G being the tree depicted in Fig. 

xi 



Kr 1 



2 
3 




Figure 9: Tree embedding for xi, x 2 , and x 3 . 



3.3.2 Necessary Condition for Tree Embedding 

Even allowing additional variables in the Gauss-Markov tree, it can turn out that embedding 
is impossible. Towards understanding the situation better, we derive a necessary condition 
for x±,...,xn to be embeddable. It turns out that this condition is also sufficient when 
N = 3. 

Proposition 1 Let N > 3. If x±, . . . ,xn can be embedded in a Gauss-Markov tree, then 

\Pik\ > \PijPjk\ (36) 

and 

PikPijPjk > (37) 
for all distinct i, j, and k. Conversely, if N = 3 and conditions IfSh)) and HF}) hold for all 



distinct i j, and k, then x±, . . . ,x^ can be embedded in a Gauss-Markov tree. 

Proof See Appendix [Ej □ 
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4 A Partial Converse 



We have shown that if the source can be embedded in a Gauss-Markov tree, then the 
separation-based scheme achieves the entire rate-distortion region for the many-help-one 
problem. This raises the question of whether the tree-embeddability condition can be re- 
laxed, or whether it is necessary in order for the separation-based scheme to achieve the 
entire rate-distortion region. We next show that it is reasonable to conjecture that tree- 
embeddability, or a similar condition, is a necessary and sufficient condition for separation 
to achieve the entire rate-distortion region. Our argument consists of two parts. 

• First, we provide an example that shows that separation does not always achieve the 
entire rate-distortion region for the many-help-one problem, which establishes that 
some added condition is required. 

• We then establish a connection between this counterexample and the tree embeddabil- 
ity condition. 



4.1 Suboptimality of Separation 

We begin by showing that the separation-based scheme does not always achieve the entire 
rate-distortion region for the many-help-one problem. Consider the special case of three 
sources (N = 3), where X\ and X2 have covariance matrix 



a 2 pa 2 
pa 2 a 2 



< p < 1. 



and where X3 = x\ — x^. We shall assume that the goal is to reproduce £3 at the decoder 
and that R3 = 0, i.e., the helpers completely shoulder the communication burden. 

We shall focus in particular on the asymptotic regime in which a 2 is large and p is near 
one. Specifically, let 

P=1 -^ 2 

and consider the behavior of the rate-distortion region as a 2 tends to infinity. Note that the 
variance of £3 does not tend to infinity, and in fact equals one for any positive value of a 2 , 
due to our choice of p. In this regime, the separation-based scheme performs quite poorly. 

Proposition 2 Let < d < 1 and let R(<J 2 , d) denote the minimum value of R\ + R2 such 
that (Rx, R2, 0, d) is in the rate- distortion region for the separation-based scheme. Then 

lim R(o~ 2 , d) = 00. 

a 2 — >oo 

Proof Please see Appendix |F] □ 
We now exhibit a scheme whose sum rate is bounded as a 2 tends to infinity. This scheme 
is simple in the sense that it operates on individual samples, not long blocks. Consider two 
lattices in R, 

A; = {k ■ 2~ n : k E Z} 
A = {k -2 m :ke Z}. 
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Let Qi(x) denote the lattice point in Aj that is closest to x; ties are broken arbitrarily. Let 

x mod Aj = x — Qi(x). 

Analogous definitions for A D are also in effect. 
Let 

xi(i)=Q i (x 1 (i)). 
For each time £, the first encoder communicates 

u 1 (£) = x 1 (£) mod A 

to the decoder. This requires sending n + m bits per sample. The second decoder operates 
analogously, yielding a sum rate of 2(n + m) bits per sample. 
The decoder uses 

x 3 (£) = [ Ul (£)-u 2 (£)} modA D 

as its estimate for x 3 (£). 

Proposition 3 For any d > 0, if m and n are sufficiently large, then 

E[(x 3 (£) - x 3 (£)) 2 } < d 

all £ and all a 2 . 

Proof Please see Appendix [G] □ 
Since n and m need not tend to infinity as a 2 grows, this simple scheme beats the 

separation-based approach by an arbitrarily large amount as a 2 tends to infinity. The scheme 

can be improved by using higher- dimensional lattices for A, and A Q . This has been explored 

by Krithivasan and Pradhan [5]. 

Conceptually, the difference between the two schemes can be understood as follows. 

Consider the binary expansion of x\. The quantity 

Qi(xi) mod A D 

can be computed from the sign of x\ and the m bits to the left of the binary point and the 
n+1 bits to the right of the binary point. Thus, Proposition [3] shows that only these n+m + 2 
bits are necessary for the purpose of reproducing the difference Xi — x 2 - In particular, it is 
not necesssary to send the bits that are more significant than the block of m to the left of the 
binary point. As a result of using a standard vector quantizer, however, the separation-based 
scheme effectively sends these most significant bits. If the variances of X\ and x 2 are large, 
this is inefficient. 
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4.2 On the Necessity of the Tree Condition 



The previous section shows that the separation-based architecture does not achieve the 
complete rate-distortion region when x\ and x 2 are positively correlated and 23 = x\ — X2, at 
least when the variances of x\ and X2 are large and their correlation coefficient is near one. 
This is also true of the problem in which x\ and x 2 are negatively correlated and X3 = X\ + x 2 . 
The defining feature of these two examples is that if Efa^a;!, ^2] = a>i%i + a-2%2-, then 

ai ■ a 2 ■ E[ Xl x 2 ] < 0. (38) 

We next show that for N = 3, if the sources cannot be embedded in a Gauss-Markov tree, 
then this condition holds, except for a possible relabeling. 

Proposition 4 For N = 3, if x\, x 2 , and X3 cannot be embedded in a Gauss-Markov tree, 
then (ESjj holds for some relabeling of x\, x 2 , and £3. 

Proof Please see Appendix |H] □ 



A Proof of Lemma 3 



Consider any encoding- decoding procedure that achieves the rate-distortion tuple 

(Ri, R 2 , . . . , R 2 l-i, d) 

for the binary tree structure problem over a block of time of length n. Let the discrete set 
Ci denote the output of encoder i (for i = 1, . . . , 2 L_1 ). We have that 

Ri > -log|a|, i = l,...,2 L ~ 1 (39) 
n 

d > -J^Var (xP{m)\C) . (40) 



m=l 



Here we have denoted 



C = {Ci,...,C 2 l-i}, (41) 



the set of all the encoder outputs. Further, the distributed nature of encoding imposes 
natural Markov chain conditions on the encoder outputs with respect to the observations. 
These Markov chain conditions are described in Figure [101 

Recall our earlier definition of the ancestors set (c.f. Equation (I30p ) 

AM = {i:0(xl k) )nA^$}, (42) 



where $ is the null set. Now define 

(Jfc) def 



{x%:ieAW}. (43) 
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X 



(L-l) 
2 L - 2 ,n 



X 



(L) 

2 L ~ 1 ,n 



Co 



C 2 L-1-1 



CoL-1 



Figure 10: The tree structure with the encoder outputs over a block of length n. 

Our outer bound will consider arbitrary subsets A of |1, . . . , 2 L_1 }. Denote the set 

C A = {Cr.ieA}. 
The sum of any subset A of the encoder rates satisfies 



n 



i&A 



> 



ieA 

> H(C A ) 

> H{C A \C A c 



(a) 



/ X 



(L-l) (L) 



^A,m ' " *~A,n ^A^'^A^A 



k=l 
L 



£(/(x«;C|x<*-» 



J 1 X An> ^-4 c l x An 



k=l 



(44) 



(45) 
(46) 

(47) 



Here each of the steps (a), (b), and (c) follow from the Markov chain conditions described 
in Figure [TU1 We use the chain rule to expand each of the mutual information terms in the 
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lower bound of Equation (|47|) : 



/(x^;C|xS^) = Yl l{^C\^\xfJiJ<ijGA^) (48) 

y: /(^c^g, (49) 



and 



/(xSl;C^|x^) = £ / (zg; C^|x^,xW j < z, j G A™) (50) 

E ^^J^K^in) (51) 



Here both Equations (|49j) and (I5TI) follow from the Markov chain conditions described in 
Figure [TUJ Denote by 

rf Si/f45;C|x' L Sj J, (52) 



n 



the term inside the summation in Equation (j49|) . Then is the number of bits per sample 

(k) 

that the encoders send about the root of the tree and r\ for k > 1 can be interpreted as 
the number of bits per sample that the encoders use to represent the noise introduced at 
node x\ . We will upper bound the terms inside the summation in Equation (I5TT) in terms 
of these quantities. To do this, we start with a central preliminary lemma. 



A.l A Preliminary Lemma 

Consider four memoryless jointly Gaussian random processes w(m), x(m), y(m), z(m), m = 
1, ... ,n. They are identically jointly distributed in the (time) index m. At any given time 
index m, their joint distribution satisfies the Markov chain conditions implied in Figure [TTJ 
Then we can write, for all m — 1, . . . , n, 




Figure 11: The Markov chain conditions. 



x{m) = a xw w(m) + n (m), 
y[m) = a yx x{m) + m(m) , 
z(m) = a zx x(m) + n 2 (m), 
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for some real a xw , a yx , a zx . Here no(m), nx(m), n 2 (m), m = 1 . . . , n, are i.i.d. in time and 
independent of each other and independent of the process w(m),m = 1, . . . , n. Further, the 
random variables no(m), ni(m), n 2 (m), w{m) at any time index n are M(0, cr^ ), Af(0, cr^J, Af(0, cr^J, 
and A/"(0, o"^,) respectively. 
Write the vectors 



Wn = 


Hi),. 


■ -,w(n)] 


(53) 


•En 


[*(1),. 


■ -,x(n)} 


(54) 


Vn = 




■ ■,y(n)] 


(55) 






..,z{n)}. 


(56) 



Consider two random variables C±, C 2 that satisfy the following two Markov chain conditions: 

(w n ,x n ,z n ,C 2 ) <-> y n <-> Ci, (57) 
(w n ,x n ,y n ,Ci) <-> z n <-> C 2 , (58) 

Our first inequality concerns this Markov chain condition. We intentionally use notation 
similar to that introduced in Section 12.41 



Lemma 5 Define 



T\ = f -I{y n ] Ci I x n ) 
n 

r 2 = -I(z n ;C 2 \x n ) 
n 

„2 „2 „,2 J2 



1 



fx (r lt r 2 ) 



dcf 



1 / a a a a \ 

l - log ( 1 + ^0 (1 _ e -2n) + ^Ao (1 _ e -2r 2 ) 



Then 



-I(x n ;C 1 ,C 2 \w n ) < fxin,^), (59) 
n 

-Iix^dlwn) < / x (ri,0), (60) 
n 

i/(z n ;C 2 K) < /.(0,r 2 ). (61) 
n 

Proof: This lemma is a conditional version (conditioned on w n ) of Lemma 3 in [7j. The 
proof follows "mutatis mutandis" that of Lemma 3 in [7] ; the only extra fact needed is that 
conditioned on any realization of w n , (x n , y n , z n ) are jointly Gaussian with their original 
variances and (x n ,y n , z n ,Ci,C 2 ) satisfies the Markov condition 

C\ <-> y n <-» X n <-> Z n <-> C 2 . 

Specifically, suppose first that a yx and a zx are nonzero. For any realization of w n , say w n , 
Oohama [H Lemma 3] has shown that 

-I(x n ; C u C 2 \w n = w n ) < f x (n, r 2 ). 



n 
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By averaging the left-hand side over w n , we obtain (I59I) . The proofs of (I60j) and (l6Tj) are 
similar. If both a yx and a 2i; are zero, then the result is trivial. If, say, only a yx is zero, then 
I(x n ; C\, C2\w n ) = I(x n ; C2\w n ) and ( |59l) follows from (j6TT) . □ 



A. 1.1 Sufficient Conditions for Equality 

It is useful to observe the conditions for equality in ( 1591) . (1601) and ( ISTl) : suppose 

C7 fc =[ Ufc (l),...,u*(n)], fc = l,2. (62) 

Here 



ui{m) = aiy(m) + t> i(m), m = 1, • • • , n, 
M 2 (m) = a 2 z(m) + v 2 (m), m = 1, ■ ■ • , n, 

where v i(m) and ^(m) are Gaussian and independent of each other and of w n , x n , y n , z n and 
are i.i.d. in the time index m. Then it is verified directly that with this choice of C%, C 2 (c.f. 
Equation (162jl ) the inequalities in Equations (159D . (1601) and ( I6T1) are all simultaneously met 
with equality (this verification is also done in [HE])- This fact will be used later to show that 
the achievable region of the separation-based inner bound coincides with the outer bound. 



A. 1.2 An Important Instance 

Of specific interest to us will be the following association of the random variables in Figure [TT1 
to the binary tree structure in Figure O fix 1 < < L - 1 and 1 < i < 2 k ~ l . Then let 

x = xf^ (63) 

y = (64) 

z = (65) 

w = zjyjjj. (66) 



(/=)■ 



With this association, denote the function corresponding to f x in Equation (|59|) by / 

i "2i— 1 (fe) 2i (fc) 

4( fe )(r 1 ,r 2 ) d = f ilog[l + — ^i(l- e - 2 -) + — ] , r 1; r 2 >0. 

(67) 



' a Ak+l) ^{k 



Indeed, this is the same notation as that introduced in Section WM (c.f. Equation (J2^ 



A. 2 An Iteration Lemma 

As an immediate application of the preliminary lemma derived in the previous section, 
consider any subset A C |l, . . . , 2 L ~ 1 }. Fix 1 < k < L — 1 and 1 < % < 2 k ~ 1 . For simplicity 

of notation, let us suppose that xf 1 is a zero random variable. 
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Lemma 6 



n 



i cu|*[£] iB ) < 4 f) (i/ (45,1; c^|xg) , 1/ cui*g>)) . (68) 



Proof: For any node a:j , recall the set of associated observations defined as (c.f. Equa- 
tion (|25|)) 

= |/ : 2* <3 -^F 

With this definition, we observe that 



ffr W 'CJr M ^1 T\ r (k) -C dr (fc_l) 1 

Then we only need to invoke Lemma [5] with the following random variables: 



.1 



" 1 i+l 

L 2 



i/n — x 2i-l,7i> 
_ (fc+1) 

This completes the proof. □ 
Observe that the parameters inside the function / (*,) (-, ■) are themselves of the type of 
the term in the left hand side of Equation (I68p . Then, we can repeatedly apply Lemma 
As an example, we have for k < L — 2 and 1 < % < 2 k ~ 1 , the two parameters of / (k) in 
Equation (158"]) are upper bounded by 



1 T ( r {k+1) ■ C Ar (k) \ < fn J 1 T ( r (k+2) ■ C Ar {k+1) \ 1 T ( r [k+2) ■ C Ar {k+1) \\m) 
~ x [ x 2i-l,m ^Al^i^nj — "'a^i \n \ 4i-3 ' n ' ' 2i-1,n / ' 71 ^ X 4i-2,n) WF2i-l,n J f, 0J J 

1 r ( T (k+1) -C Ar {k A < fn J 1 T ( r (k+2) ■ C Ar [k+1) \ 1 T ( r [k+2) ■ C Ar {k+1) S\ (70) 

~ J [ X 2i,n i^A\%i, n J - J x ^+ l ) [- 1 \ X ii-l,n->^A\%2i,n ) ) ~ 1 \ X 4i,n ' K - J M X 2i,n Jj-\ (() ) 

Now the function / (*) (•, •) is monotonically increasing in both of its parameters (this is true 

X i 

for each 1 < k < L — 1 and 1 < i < 2 k ~ 1 ). So, we can combine Equations fl68l) . (1701) and (1691) 

to get 

l T(J k ).r \J k ~^ \ <r t (f ( l T(J k+2 ^-r \J k+1 ) \ 1 t ( J k + 2 ) . r \J k +V W 

~ X \ X i,ni WF^itlj^J ^ J X W I Jj.lHi) I —J ^4i-3,n> K - J A\ x 2i~l,nJ 1 ~ X ^ X 4i-2,n> W|^2i-l,n J I ) 

4 S «» (i/ , i/ (4t 2) ; ewC))) . (71) 
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The stage is now set to recursively apply Lemma El Continuing this process until the 
boundary conditions are met, we arrive at 

i/(4S;C.K-g<4,(r.(,f»)). (72) 

Here the set y x ij is defined as in Equation ( 1261) : 

rA(4 k) ) = {rf ] -- xf e T (x^ , 0{xf) c A, 

e T (xty with O c A, 

and xf E Tl (x^) U £(x^)} . (73) 
The function /(*)(-) was also defined in Section [2~il 

A. 3 Putting Them Together 

We are now ready to complete the proof of Lemma [3j First, we substitute Equation (IT2]) in 
Equation (|5Tj) to get 

i (*£,; c^ixt 1 ') < £ 4« (*?>)) • (74) 

ie _4(fe) 

Combining Equation ( J74l) with Equations ( 1491) and (J52l) . we can rewrite the inequality in 
Equation (H71) as 

E^>EE (r* W -^)(^? ) )))- (75) 
ie.4 fe=i ie^(fc) 

The quantities r\ satisfy other natural inequalities as well: 

• Supposing that A equals the entire set {l, 2, . . . , 2 L ~ 1 } and substituting in Lemma [6] 
we have 

^</r(r»,rr). (76) 



27 



By direct calculation we also have 



n \ 

n \ ' / n \ ' 



> 
> 



> -log (2^%) ~-h(xW 



E 



x 



(i) 

l.n 



c 



2n 
1 

2 



log det I Covar I x\ 



„(i) 



E 



r (1) IC 



- log ( — Trace ^Covar (x^ n 



n 



E 



•file])) 



I / 1 n 

-log -E Var (4 1} M|c 



m=l 



£7 



1 

> -log^i 
- 2 s ci 



in 



(77) 
(78) 
(79) 
(80) 
(81) 

(82) 
(83) 



where: 



Equation ( 1791) follows from the fact that conditioning only reduces the differential 
entropy; 

Equation ( JHOi) is the usual bound on the differential entropy of a vector by the 
determinant of its covariance matrix; 



— Equation (IHTj) follows from the Hadamard inequality on the determinant of a 
positive definite matrix in terms of its trace; 

— Equation (IHBl) follows from the fact that the encoder outputs describe the original 
root node of the tree with sufficiently small quadratic fidelity (c.f. Equation (140]) ). 

Based on Equations (175|) and (15B"j) we see that the set of indeed belong to the set F r (d) 
defined in Equation (1271) . Combining this fact with the key inequality in Equation (1T5]) . we 
have completed the proof of the outer bound in Lemma [3j □ 



B Proof of Lemma 4 



Since we know that 

co(KT> in ) c KV out , 

it suffices to prove that for any d and any componentwise nonnegative vector (a\, 



inf \ ctiRi > inf aiRi 

R:(R,d)e7?.X>out Z — ' R:(R,d)e7«>i„ Z — ' 



?4) 
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We will assume that a\ < a 2 < • • • < a^-i. The proof for the other orderings is similar. 
We will also use the convention a = 0. Now for any R 1} . . . , R 2 l-i, 

2 l-i 2 £-l 2 L_1 

= ax e R i + ( a 2 - «i) Ri + 

i=l i=l i=2 

■ ■ ■ + (ot 2 i-i — a 2 i-i_i)-R2 i - 1 5 
= E^' - a i-i) J2 Ri 

j=l i=j 

Thus 

2 L-1 2 i-l 2 L-1 

inf V" = inf V"(«j - «j-i) E -Ri- 

1=1 J = l *=1 

Let e > 0. Then there exists s G J-" r (d) and R* such that 

j=l i=l j=l i=l 

and 



E**^E E {'P - f$M*l k) )) 

k=i ieAW 

A j = {j,...,2 L - 1 }n{i:s<P>o}. 



ieA k=l ie„4(*0 

for all A. Let 



Then 



E(^ - E R * * E(^ - a i-0 E ^* 

i=i t=i i=i ieA,- 

> E(«i - E E - f%M4 k) ))) 

3=1 k=l ieA (k) 

> mf 2 ^(«, - tti _o e E - /1)(^(^ fc) )))> 

i=l fc=l i&A W 
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where the infimum is over all r in !F r (d) such that r\ = if and only if s[ = 0. Then 
there exists s G T r (d) such that = if and only if = and 



L 

Ek- - oLj-t) E E ti k) - $)0M*? } ))) 

i=l k=l i6A W 



3 

2 L - 1 L 



) 

'3 



< m ek- - «i-o E E - /; + e ( §5 ) 

3=1 k=l ieA W 



and the s minimize 

EE*f } - ( 86 ) 

fc=l t=l 

Now since the sf^ are in !F r {d)^ we have 

2 

m 1 a x (1) 
§? > g ^- (87) 

# } <4«(^-^^ +1) )- (88) 

We will show that both of these inequalities must actually be equalities. Since the left-hand 
side of (1851) is monotonically decreasing in and the s\ k ^ minimize (1861) . it follows that the 
inequality must be tight. 
Next suppose that 

< />)(^2m-l!^2m ) (89) 

for some non-leaf node Xm . We will show that this is incompatible with the assumption that 
the Sj- minimize (I86|) . Without loss of generality, we may assume that none of the children 
of im' have a strict inequality in ( J88l) . In order for (189]) to hold, must be positive for 

■» * ^ l_md.6r f ^ ^ ^ (mi curl nr 4" It i"i Innf -f rn i'i . i V\1 /~i rv» * ^ n n ^ i^\v >r» * 

largest index rh such that is positive: 

2 L (m-l) 2 L m 



at least one leaf variable x) under x m ■ Consider the leaf variable x- under xln with the 

~(L) 
S - 



m = argmax ■{ — < j < : s[ > *> . 

2 n 

Then consider the descendant of ft , x< fh +1 \ that leads to the leaf variable x^\ Note that 
we must have s^ +1 ^ > 0. 

Suppose that we decrease s^ +1 ^ by a slight amount such that (1891 still holds. Fix a j in 
{!,..., 2 L_1 } and consider the sum 



EE (#>-$> M l> : 



(90) 
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and recall that 

A j = {j,...,2 L - 1 }n{i:sf , >0}. 

Now if j > rh, then all of the observations under Xm are in Aj, which implies that the 
sum in (1901) does not depend on s^ +1 \ On the other hand, if j < rh, then not all of the 
observations under s^ +1 ^ are in Aj, and so 

4" +1) i ^ (4") 

for all xf*\ It follows that the objective in (1851) is not increased while the sum in (186]) is 
reduced by decreasing s^ +1 ^ , which is a contradiction. Thus ( 1891) cannot hold at any non-leaf 



nodes in the tree. We have thus shown that equality must hold in ( 1871) and ( 1881 ). 
We are now in a position to show that 

1> -<*-.)£ E <*?* - 4<m*?>))) > ,,*^ w !>*. 

i=i fc=i ieA w K ' y ' i=i 

Specifically, choose the auxiliary random variables u in the Berger-Tung inner bound such 
that 

for each observation %. We will first show by induction that 

^? ) i^[& J )= s ? ) ( 91 ) 

for all variables x- fc ^ in the tree. This is true of the leaf variables x[ L \ i = 1, . . . , 2 L ~ 1 by 
hypothesis. Next consider a variable x^ and suppose the condition holds for a^i-i anc ^ 
• By the observation in Appendix IA.1.11 

I{xf } ; u|z[*^5 )/2j ) = / E w(i'(4*i 1) ; n\xf } ), u|xf } )) 

- J X W ( S 2i-1 » % J 
~(fc) 



This establishes (I9T1) . Then 

E[(xS 1} - E[xS 1} |u]) 2 ] = ^ 2 d) exp(-2sS 1) ) = d. 

x i 

Thus u is in U(d). If we let 

Ri = I{xf ) \u i \ui, . . . 
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then (R, d) is in 1ZT> ni . Since U\ is conditionally independent of u and all of the source 
variables given x\ , it follows that s\ = if and only if Ui is independent of all of the other 
variables. We will show that 

2 L-1 

^R i = I{xf\ . . .fX^litUj, . . .,u 2 l-i\ui, . . . ,itj_i) 
i=j 

by induction. For j = 2 L_1 , this condition holds by the definition of Rj. Next suppose that 
the condition holds for j. Then by the tree structure, 

^ Ri = Uj-x\v>l, ■ ■ ■ ,Uj-2) + I{xf\ . . . jZ^-iJWj, . . . ,W 2 l-i|wi, • • - ,Uj-i) 

i=j-l 

it ( L ) (i) I s . T , (L) (L) I \ 

77 ( L ) ( L ) 1 ^ 



Thus 



inf >^ < oti&i 

j=l i=l 



^ ^(aj - Qfj_i)/(a;j \ ...,x i 2L l 1 ;u j ,...,u 2 L-i\u 1 ,...,u j _ 1 ] 
3=1 

2 L-1 

= ^Z( a 3 - "i-lK^Wju^)- 



By mimicking (]45p through (|5ip , one can show that 

L 



/(xjgju^iu^) = ^ ( 5 f ) - / (4 fe) ; u ^l4(i+ 1 i)/2j))- 



fc=1 iGA{ k) 



But by Lemma [6] and the observation in Appendix IA.1.11 

i (xf ] ■ u A c |z ( L (^5 )/2J ) = (s^ (4 fe) ) ) • 

i 

It follows that 

2 L-1 2 L-1 

inf > < inf } c^i?,; + 2e. 

R:(R,d)eKI5 in ^— ' R:(R.rf)e7e©out 

i=l j=l 

Since e was arbitrary, the proof is complete. 
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C Proof of Theorem H 

We must show that 1ZT>* C co{lZT>\ n ). Since both sets are convex, it suffices to show that 
for any componentwise nonnegative vector {(3\, . . . , (3 2 l-i,(3) 

2 L-1 2 L-1 

inf V {3iRi + (3d> inf V (3^ + (3d (92) 



i=l v w- v — i=1 

2 l-i 



= inf V ^ijj + (3d. 
rR,d)env in ^— ' 

i=l 

We shall assume that /?i < /?2 < ■ ■ • < (3 2 l-i; the other cases are similar. Let us temporarily 
use 7£X>*(K:e) to denote the rate-distortion region for the binary tree structure problem when 
the source variables have covariance matrix and similarly for TZT> in (K x ). If is such 
that all of the noise variances are positive, then (192]) follows from Lemma [3j 

If some of the noise variances are zero, then let Kx be a sequence of source covariance 
matrices converging to K x such that for each n, corresponds to a source satisfying the 
binary tree structure for which all of the noise variances are positive. Then 1ZD*(K^) = 
co(7^I\ 1 (Ki n ' > )) for each n, so 

nL-l 9L-I 

inf J2/3iRi + l3d= inf S^(3 l R l + (3d. 

(R,d)e^D*(Ki n) ) i=1 (R,d)G^X> in (Ki n) ) i=1 

We will first show that 

2 L-1 2 L-1 

liminf inf V (3^ + (3d > inf V (3^ + (3d. (93) 
™ (n,d)env in (K^) (n,d)e-R.v in (K x ) j-f 

For each n, there exists a set of auxiliary random variables u^ n ^ such that [HI Lemma 3.3] 

2 L-1 

inf V 0iRi + (3d 

(R,rf)G^X> in (Ki n) ) i=1 
2-L-l 

= £ ' n) \xf ' n \ . . . , a© 1 >) + PE { (z? ' n) - E[x^ n) |u (n) ]) 2 } • (94) 

i=l ^ J 

Here x 4 - L ' n ' ) denotes the ith variable at depth L of the tree corresponding to covariance matrix 
Ki n \ Now the auxiliary random variables u^ n ^ can be parametrized by a compact set, so 
consider a subsequence of Kx along which u^ n ^ converges in distribution to a limit u and 
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the right-hand side of (1941) converges to the liminf. Then 



liminf inf > (3iRi + (3d 

= W\ ■ ■ ■ , *&) + m { - e^V]) 2 } 

2 L-1 

> inf V PiRi + (3d. 
i=i 

This establishes (1931) . On the other hand, Chen and Wagner [2] have shown that the rate- 
distortion region is inner-semicontinuous: 

2 L-1 2 L-1 

limsup inf } (3iRi + (3d< inf > (3iRi + (3d. 

n^oo (R,d)e^*(Ki' l) ) ^ (R,d)67JD*(K !C ) ^ 

Together with ( |93l) . this establishes ( 1921) and hence Theorem [U 



D Proof of Theorem [2 



It suffices to show (134"1) . If (R, d) is in TZV m , then there exist auxiliary random variables u 
in U (d) such that 



d>E 



-E[xf } |u] 



and 



^Ri>I (x.^;u A \u A c 



for all ^4. Now for each i, 



Ui = a>iX\ +Wi, 



where Wi is Gaussian and independent of x [ . Let Ui be a quantized version of using 
the same test channel 

~{L) . 

Ui = atiX] + Wi. 

Let MMSE ^a;^|uj denote the mean-square error of the minimum mean-square error 
(MMSE) estimate of given u. Likewise, let LLSE (rc^ |u) denote the mean-square error 
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of the linear least-square error (LLSE) estimate of x^ given u. Then 

2" 



Also, for any A, 



E 



e[xS 1} |u; 



= MMSE(4 1} |u) 

< LLSE(£i 1} |u) 
= LLSE(xS 1} |u) 

= MMSE(a;S 1) |u) 

< d. 



iGA 



= h(u A \u A c) 


- h(u A \u A e, 


x (L) 


= h(u A \u A c) 






> h(u A \u A c) 


- Mu4|x^ } ; 




= h{u A \u A c) 


- h(u A \^ ] ] 




= h(u A \u A c) 


- h(u A \u A c, 


x (L) 



I(5C A >;u A \u A c 



where in the inequality we have used the fact that the Gaussian distribution maximizes 
entropy for a fixed covariance. It follows that (R, d) is in VSD m . 



E Proof of Proposition U 

Suppose that x±, . . . , can be embedded in a Gauss-Markov tree and fix distinct indices i, 
j, and k. Without loss of generality, we may assume that all variables in the tree have mean 
zero and variance one. Consider two paths (i.e., two sequences of variables), one from to 
Xj and one from to xj~- Evidently both paths contain xf, let x denote the last variable in 
the first path that is contained in the second. This is the point at which the two paths split, 
as shown in Fig. O Note that it is possible for x to equal Xj, Xj, or x^. 

Now since x is along the path from X{ to Xj, it follows from the tree condition that 
Xi <-» x «-> Xj. Likewise Xi «-> x <-> Xk- Since all of the variables are standard Normals, this 
implies pH (5.13)] 

Pij = E[xjx]E[xXj] (95) 
p ik = E[xix}E[xx k ]. (96) 

Next consider the paths from Xj to x$ and from Xj to Xk, and let x denote the last variable 
in the first path that is contained in the second. Then both x and x lie along the path from 
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Figure 12: x is the point at which the two paths split. 

Xj to Xj. If x 7^ x, then the path from x to x^ to x to x would form a loop, which is impossible 
since the graph is a tree. Thus x must equal x. Thus Xj <-> x <-> x& and 

p ifc = E[x.,-x]E[xx fc ]. 

Combining this equation with (}95l) and (1961) yields conditions (|36|) and (1371) . 

Now suppose that N = 3 and conditions (1361) and (|37|) hold. If p^- is nonzero for all i ^ j, 
then 

< < 1 

for all distinct i, j, and k. This implies that Xi, X2, and X3, can be written 



/P12P13 / \ 

a^i = W sgn(p 2 3j ■ %o + zi 

P23 



/P12P23 , » 
X2 = \ sgn(pi 3 ) • x + z 2 

Pl3 



/P13P23 , s 

X3 = \ sgn(p 12 ) • x + z 3 , 

V P12 

where sgn(-) is the signum function 

1 if p > 
sgn(p) = ^ if p = 
-1 ifp<0, 

and where xo, z\, Z2, Z3 are independent Gaussian random variables. Here x is a standard 
Normal and the variances of the zs are chosen to such that the xs have unit variance. It 
is readily verified that this construction yields the correct correlation coefficients among the 
xs. It is then clear that x and the xs can be arranged in the Gauss-Markov tree shown in 
Figure [3 

If, say, P12 = 0, then by condition (|36j) . either p i3 = or p 2 3 = 0. Suppose that P13 = 0. 
Then x\ is uncorrelated, and hence independent, of x 2 and X3. It follows that the xs can be 
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written 



Xi = Zi 

x 2 = V\P23\ -x + z 2 

X3 = V\P23\ ■ Sgn(p 2 3) ■ X + Z 3 , 

so that the xq and the xs can again be arranged in the Gauss-Markov tree shown in Figure EB 

F Proof of Proposition [2 

Since we are assuming that R 3 = 0, the problem effectively reduces to a two-encoder setup. 
By Lemma [T] and (1201) , the minimum Ri + R 2 equals 

inf J(x; u) 
subject to u\ <-» x\ <r+ X2 <-> u 2 

(x, u) jointly Gaussian 
E[0r 3 -E[x 3 |u]) 2 ] < d. 

Without loss of generality, we may assume that 

U1—X1 + Z\ 

u 2 = x 2 + z 2 

where the z variables are Gaussian and independent of each other and x. Let Z\ have variance 
aa 2 and z 2 have variance (5a 2 . 

Via straightforward calculations one can show that 

J(x; u) = - log ((1 - p 2 )^- 1 ^- 1 + a- 1 + /T 1 + 1) (97) 



and 



Now 



a 2 4(l + a)(l + - 4p 2 ' 



2(l + p) + a + /5 4 + a + /? 



4(1 + a)(l + /3) - 4p 2 - Aa + 4(3 + 4a/? 

1 + a + /5 



< 

a + /3 + a/5 
1 

<-, + 2. 

ap 

It follows that as a 2 tends to infinity, in order to continue to meet the distortion constraint, 
we require that a(3 tend to zero. But this implies that 7(x; u) tend to infinity, by (l97j) . 
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G Proof of Proposition [3 

Since the average distortion is the same for all £, let us assume that i = 1 and write x 3 in 
place of £3(1) and likewise for the other variables. Then by the triangle inequality 

y/E[( X3 ~ X 3 f] < y/E[( X3 - (Xi - 5 2 )) 2 ] + ^[((^1 " X 2 ) ~ X 3 ) 2 }. 

Now 



\x 



i-xJ < 2~ (n+1) 



and likewise for \x 2 — x 2 \. Thus 

E[(x 3 -(x 1 -x 2 )) 2 ]<2- 2n . 



Define the event 
Now on A, 



A={\x 1 -x 2 \ <2 m ' 1 }. 

£3 = ui — u 2 mod A D 
= X\ — x 2 mod A G 
= Xi- x 2 , 

E[(xi - x 2 - x 3 ) 2 ] = E[(xi -x 2 - x 3 ) 2 l A c 



so 



< ^E[( Xl -x. 2 -x 3 y]F(A c ). 

But 

\x\ — x 2 — x 3 \ < \xi - x 2 \ + \xi - Xx\ + \x 2 - x 2 \ + \x 3 \ 

< \ Xl -x 2 \ + 2" n + 2 m ~ 1 

< \x 1 -x 2 \+2 m . 

Since x\ — x 2 is a standard Normal random variable, E[(xi — X2) 4 ] = 3, and Minkowski's 
inequality implies 

Epi - x 2 - x 3 ) 4 ] < 3 + 2 m . 

It only remains to bound F(A C ). Using a well-known upper bound on the tail of the Gaussian 
distribution 

F(A C ) < 2exp(-2 2m - 3 ). 
Combining these various bounds gives 

E[(x 3 - X3) 2 } < (2~ n + (2(3 + 2 m ) exp(-2 2m - 3 )) 1 / 2 ) 2 

Proposition [3] follows. 
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H Proof of Proposition 



4 



Recall we may assume that all of the variables have unit variance. By Proposition (TJ if x±, 
X2, and x 3 cannot be embedded in a Gauss-Markov tree, then either 

P12P13P23 < (98) 



or 



\pij\ < \pikPkj\ (99) 



for some distinct i, j, and k. 

Suppose first that (198]) holds. Then we must have \pij\ < 1 for all i 7^ j. Now 



irp r I 1 Pl2 — P13P23 . Pl3 — P12P23 

E xi \x 2 ,x 3 \ = — x 2 H 5 x 3 

1 - P23 1 - P23 



def 

= a 2 x 2 + a 3 x 3 . 



Then 



1 P23 / 2 \ / 2 \ / \ 

a 2 -a 3 - p 23 = - ^ (p 12 - Pi2Pi3P23)(Pi 3 - P12P13P23), (100) 

I 1 -P23) P13P12 

which is negative by (!98j) . This establishes the desired conclusion in this case. We will 
therefore assume throughout the remainder of the proof that p\ 2 pi 3 p 23 > 0. 

Suppose that (|99|) holds, say, for i = 1, j = 2, and & = 3. Then we must have |pi2| < 1 
and P13 ■ P23 7^ 0. Furthermore, if |p 2 3| = 1, then |p 12 | = \pv&\, which would contradict 
Thus we may assume that |p 2 3| < 1- First suppose that p\ 2 = 0. Then 

n n n - Pl3P23 

a 2 ■ a 3 ■ p 23 - 



(i-py 2 ' 

which is negative. We will therefore focus on the case in which pi 2 p± 3 p 23 > 0. 

Next observe that since we are assuming that (|9"9"|) holds for i = 1, j = 2, and k = 3, the 
opposite inequality must hold strictly in the other two cases 

1/013 1 > |Pl2P23| 
1/023 1 > |Pl2Pl3|- 

This can be seen by contradiction: if, e.g., |p 13 | < IP12P23I, then combining this fact with 
flU} yields 

|Pl2 1 < |Pl3P23| < |Pl2||P23| 2 

which is evidently false. From fllOOp and the three assumed conditions, p\ 2 p\ 3 p 23 > 0, 
1/012 1 < IP13P23I, and |pi 3 | > IP12P23I, it follows that a 2 ■ a 3 ■ p 23 is negative, as desired. 
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